content moderation
CulturePark: Boosting Cross-cultural Understanding in Large Language Models
Cultural bias is pervasive in many large language models (LLMs), largely due to the deficiency of data representative of different cultures.Typically, cultural datasets and benchmarks are constructed either by extracting subsets of existing datasets or by aggregating from platforms such as Wikipedia and social media.However, these approaches are highly dependent on real-world data and human annotations, making them costly and difficult to scale.Inspired by cognitive theories on social communication, this paper introduces CulturePark, an LLM-powered multi-agent communication framework for cultural data collection.CulturePark simulates cross-cultural human communication with LLM-based agents playing roles in different cultures.It generates high-quality cross-cultural dialogues encapsulating human beliefs, norms, and customs.Using CulturePark, we generated 41,000 cultural samples to fine-tune eight culture-specific LLMs.We evaluated these models across three downstream tasks: content moderation, cultural alignment, and cultural education.Results show that for content moderation, our GPT-3.5-based
WIRED Roundup: DOGE Isn't Dead, Facebook Dating Is Real, and Amazon's AI Ambitions
WIRED Roundup: DOGE Isn't Dead, Facebook Dating Is Real, and Amazon's AI Ambitions In this episode of, we bring you the news of the week, then dive into how some DOGE operatives are still at work in the federal government--despite reports claiming otherwise. Uncanny Valley host Zoë Schiffer is joined by senior editor Leah Feiger to discuss five stories you need to know about this week, from how Amazon is trying to catch up in the AI race to why Facebook Dating is more popular than ever. Then, they dive into how--despite recent reports claiming that it's over--DOGE operatives are still very much working across federal agencies. Who the Hell Is Actually Using Facebook Dating? Sex Workers Built an'Anti-OnlyFans' to Take Control of Their Profits Here's What Its Operatives Are Doing Now Write to us at uncannyvalley@wired.com . You can always listen to this week's podcast through the audio player on this page, but if you want to subscribe for free to get every episode, here's how: If you're on an iPhone or iPad, open the app called Podcasts, or just tap this link . Today on the show, we're bringing you five stories that you need to know about this week, including how despite some reports claiming that the so-called Department of Government Efficiency is pretty much over, DOGE people are actually still at work across federal agencies. I'm joined today by our senior politics editor, Leah Feiger. How are you doing today? I am great because I've spent the day with you, but our gentle listeners don't know that. So the first story this week is one that I saw and I thought, you know what? Leah's going to want to talk about Amazon's artificial intelligence prowess.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Slovakia (0.04)
- Europe > Czechia (0.04)
When Harmful Content Gets Camouflaged: Unveiling Perception Failure of LVLMs with CamHarmTI
Li, Yanhui, Zhou, Qi, Xu, Zhihong, Guo, Huizhong, Wang, Wenhai, Wang, Dongxia
Large vision-language models (LVLMs) are increasingly used for tasks where detecting multimodal harmful content is crucial, such as online content moderation. However, real-world harmful content is often camouflaged, relying on nuanced text-image interplay, such as memes or images with embedded malicious text, to evade detection. This raises a key question: \textbf{can LVLMs perceive such camouflaged harmful content as sensitively as humans do?} In this paper, we introduce CamHarmTI, a benchmark for evaluating LVLM ability to perceive and interpret camouflaged harmful content within text-image compositions. CamHarmTI consists of over 4,500 samples across three types of image-text posts. Experiments on 100 human users and 12 mainstream LVLMs reveal a clear perceptual gap: humans easily recognize such content (e.g., over 95.75\% accuracy), whereas current LVLMs often fail (e.g., ChatGPT-4o achieves only 2.10\% accuracy). Moreover, fine-tuning experiments demonstrate that \bench serves as an effective resource for improving model perception, increasing accuracy by 55.94\% for Qwen2.5VL-7B. Attention analysis and layer-wise probing further reveal that fine-tuning enhances sensitivity primarily in the early layers of the vision encoder, promoting a more integrated scene understanding. These findings highlight the inherent perceptual limitations in LVLMs and offer insight into more human-aligned visual reasoning systems.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.87)
AWS CEO Matt Garman Wants to Reassert Amazon's Cloud Dominance in the AI Era
As Google and Microsoft continue to surge, the AWS chief lays out his pitch: cheaper, reliable AI delivered at hyperscale. You might think Amazon's biggest swing in the AI race was its $8 billion investment in Anthropic. But AWS has also been building in-house foundation models, new chips, massive data centers, and agents meant to keep enterprise customers locked inside its ecosystem. The company believes these offerings will give it an edge as businesses of all shapes and sizes deploy AI in the real world. WIRED sat down with AWS CEO Matt Garman ahead of the company's annual re:Invent conference in Las Vegas to discuss his AI vision, and how he plans to extend Amazon's lead in the cloud market over its fast-rising competitors, Microsoft and Google.
- North America > United States > Nevada > Clark County > Las Vegas (0.25)
- North America > United States > California (0.15)
- Asia > Nepal (0.15)
- (2 more...)
- Information Technology > Services (1.00)
- Retail > Online (0.78)
- Information Technology > Cloud Computing (1.00)
- Information Technology > Communications > Mobile (0.70)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.41)
Ideology-Based LLMs for Content Moderation
Civelli, Stefano, Bernardelle, Pietro, Pratama, Nardiena A., Demartini, Gianluca
Large language models (LLMs) are increasingly used in content moderation systems, where ensuring fairness and neutrality is essential. In this study, we examine how persona adoption influences the consistency and fairness of harmful content classification across different LLM architectures, model sizes, and content modalities (language vs. vision). At first glance, headline performance metrics suggest that personas have little impact on overall classification accuracy. However, a closer analysis reveals important behavioral shifts. Personas with different ideological leanings display distinct propensities to label content as harmful, showing that the lens through which a model "views" input can subtly shape its judgments. Further agreement analyses highlight that models, particularly larger ones, tend to align more closely with personas from the same political ideology, strengthening within-ideology consistency while widening divergence across ideological groups. To show this effect more directly, we conducted an additional study on a politically targeted task, which confirmed that personas not only behave more coherently within their own ideology but also exhibit a tendency to defend their perspective while downplaying harmfulness in opposing views. Together, these findings highlight how persona conditioning can introduce subtle ideological biases into LLM outputs, raising concerns about the use of AI systems that may reinforce partisan perspectives under the guise of neutrality.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Oceania > Australia > Queensland > Brisbane (0.04)
- Asia > Middle East > Jordan (0.04)
- (3 more...)
- Government (0.67)
- Education (0.67)
- Telecommunications (0.46)
- Leisure & Entertainment (0.46)
MoMoE: Mixture of Moderation Experts Framework for AI-Assisted Online Governance
Goyal, Agam, Zhan, Xianyang, Chen, Yilun, Saha, Koustuv, Chandrasekharan, Eshwar
Large language models (LLMs) have shown great potential in flagging harmful content in online communities. Yet, existing approaches for moderation require a separate model for every community and are opaque in their decision-making, limiting real-world adoption. We introduce Mixture of Moderation Experts (MoMoE), a modular, cross-community framework that adds post-hoc explanations to scalable content moderation. MoMoE orchestrates four operators -- Allocate, Predict, Aggregate, Explain -- and is instantiated as seven community-specialized experts (MoMoE-Community) and five norm-violation experts (MoMoE-NormVio). On 30 unseen subreddits, the best variants obtain Micro-F1 scores of 0.72 and 0.67, respectively, matching or surpassing strong fine-tuned baselines while consistently producing concise and reliable explanations. Although community-specialized experts deliver the highest peak accuracy, norm-violation experts provide steadier performance across domains. These findings show that MoMoE yields scalable, transparent moderation without needing per-community fine-tuning. More broadly, they suggest that lightweight, explainable expert ensembles can guide future NLP and HCI research on trustworthy human-AI governance of online communities.
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- (5 more...)
- Law (0.68)
- Health & Medicine (0.46)
Quantifying Feature Importance for Online Content Moderation
Tessa, Benedetta, Moreo, Alejandro, Cresci, Stefano, Fagni, Tiziano, Sebastiani, Fabrizio
Accurately estimating how users respond to moderation interventions is paramount for developing effective and user-centred moderation strategies. However, this requires a clear understanding of which user characteristics are associated with different behavioural responses, which is the goal of this work. We investigate the informativeness of 753 socio-behavioural, linguistic, relational, and psychological features, in predicting the behavioural changes of 16.8K users affected by a major moderation intervention on Reddit. To reach this goal, we frame the problem in terms of "quantification", a task well-suited to estimating shifts in aggregate user behaviour. We then apply a greedy feature selection strategy with the double goal of (i) identifying the features that are most predictive of changes in user activity, toxicity, and participation diversity, and (ii) estimating their importance. Our results allow identifying a small set of features that are consistently informative across all tasks, and determining that many others are either task-specific or of limited utility altogether. We also find that predictive performance varies according to the task, with changes in activity and toxicity being easier to estimate than changes in diversity. Overall, our results pave the way for the development of accurate systems that predict user reactions to moderation interventions. Furthermore, our findings highlight the complexity of post-moderation user behaviour, and indicate that effective moderation should be tailored not only to user traits but also to the specific objective of the intervention.
- North America > United States > Texas > Travis County > Austin (0.14)
- Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)
- North America > United States > Virginia (0.04)
- (3 more...)
- Health & Medicine (0.93)
- Information Technology > Security & Privacy (0.67)
- Media > News (0.67)
SMARTER: A Data-efficient Framework to Improve Toxicity Detection with Explanation via Self-augmenting Large Language Models
Nghiem, Huy, Sachdeva, Advik, Daumé, Hal III
WARNING: This paper contains examples of offensive materials. To address the proliferation of toxic content on social media, we introduce SMARTER, we introduce SMARTER, a data-efficient two-stage framework for explainable content moderation using Large Language Models (LLMs). In Stage 1, we leverage LLMs' own outputs to generate synthetic explanations for both correct and incorrect labels, enabling alignment via preference optimization with minimal human supervision. In Stage 2, we refine explanation quality through cross-model training, allowing weaker models to align stylistically and semantically with stronger ones. Experiments on three benchmark tasks -- HateXplain, Latent Hate, and Implicit Hate -- demonstrate that SMARTER enables LLMs to achieve up to a 13.5% macro-F1 improvement over standard few-shot baselines while using only a fraction of the full training data. Our framework offers a scalable strategy for low-resource settings by harnessing LLMs' self-improving capabilities for both classification and explanation.
- Oceania > Australia (0.04)
- North America > United States > Maryland (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (3 more...)
- Law > Civil Rights & Constitutional Law (0.68)
- Law Enforcement & Public Safety > Terrorism (0.46)
Asking For It: Question-Answering for Predicting Rule Infractions in Online Content Moderation
Samory, Mattia, Pamfile, Diana, To, Andrew, Phadke, Shruti
Online communities rely on a mix of platform policies and community-authored rules to define acceptable behavior and maintain order. However, these rules vary widely across communities, evolve over time, and are enforced inconsistently, posing challenges for transparency, governance, and automation. In this paper, we model the relationship between rules and their enforcement at scale, introducing ModQ, a novel question-answering framework for rule-sensitive content moderation. Unlike prior classification or generation-based approaches, ModQ conditions on the full set of community rules at inference time and identifies which rule best applies to a given comment. We implement two model variants - extractive and multiple-choice QA - and train them on large-scale datasets from Reddit and Lemmy, the latter of which we construct from publicly available moderation logs and rule descriptions. Both models outperform state-of-the-art baselines in identifying moderation-relevant rule violations, while remaining lightweight and interpretable. Notably, ModQ models generalize effectively to unseen communities and rules, supporting low-resource moderation settings and dynamic governance environments.
- Information Technology (1.00)
- Law (0.68)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Communication Bias in Large Language Models: A Regulatory Perspective
Kuenzler, Adrian, Schmid, Stefan
Large language models (LLMs) are a prominent subset of AI, built on advanced neural network architectures that can generate new data, including text, images, and audio. LLMs utilize various technologies to identify patterns in a given set of training data, without requiring explicit instructions about what to look for [ 12, 35 ] . LLMs typically assume that the training data follows a probability distribution, and once they have identified existing patterns, they can generate new instances that are similar to the original data. By drawing from and combining training data, LLMs can create new content that tran scends the initial dataset [1 7 ].
- Asia > China > Hong Kong (0.04)
- Europe > Germany > Berlin (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (4 more...)
- Law > Statutes (1.00)
- Information Technology > Security & Privacy (1.00)
- Government > Regional Government > Europe Government (0.93)
- (2 more...)